Song Ruizhuo, Wei Qinglai. Chaotic system optimal tracking using data-based synchronous method with unknown dynamics and disturbances
. Chinese Physics B, 2017, 26(3): 030505
Permissions
Chaotic system optimal tracking using data-based synchronous method with unknown dynamics and disturbances
Song Ruizhuo1, Wei Qinglai2
, †
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration (PI) is introduced to solve the min-max optimization problem. The off-policy adaptive dynamic programming (ADP) algorithm is then proposed to find the solution of the tracking Hamilton–Jacobi–Isaacs (HJI) equation online only using measured data and without any knowledge about the system dynamics. Critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded (UUB) of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem.
Chaotic system has complex nonlinear dynamics and its response exhibits some specific characteristics such as sensitivity to the initial condition, broad Fourier transform spectra, and irregular identities of the motion in phase space.[1–4] An early development in the history of chaotic systems dates back to 1963 in which an American meteorologist Lorenz tried to bring forward system equations to simulate the weather changes. The Lorenz system, as the first chaotic model, revealed the complex and fundamental behaviors of the nonlinear dynamical systems.[5] And then the concept of the generalized Lorenz system was extended by Lü, and a class of generalized Lorenz-like systems was discussed.[6] Until now, many efficient approaches have been proposed for controlling chaotic systems, such as impulsive control method,[7–9] adaptive dynamic programming (ADP) method,[10] and neural adaptive control.[11] Most of these methodologies consider the chaotic system without disturbance.
In this paper, we provide a new method to design an optimal tracking controller for chaotic systems with unknown dynamics and disturbance based on ADP algorithms. ADP characterized by strong abilities of self-learning and adaptivity has received significantly increased attention and becomes an important brainlike intelligent optimal control method for nonlinear systems.[12–15] ADP algorithms include value iteration (VI) and policy iteration (PI) according to the different iterative methods.[16,17] In Ref. [18], a complex-valued ADP algorithm was discussed, where for the first time the optimal control problem of complex-valued nonlinear systems was successfully solved by PI. In Ref. [19], based on neurocognitive psychology, a novel controller based on multiple actor-critic structures was developed for unknown systems and the proposed controller traded off fast actions based on stored behavior patterns with real-time exploration using current input–output data. In Ref. [20], an effective off-policy learning based integral reinforcement learning (IRL) algorithm was presented, which successfully solved the optimal control problem for completely unknown continuous-time systems with unknown disturbances. Note that the development of ADP promotes the development of adaptive control and neural networks control. For example, in Ref. [21], a novel adaptive nonlinear controller was designed to achieve stochastic synchronization of complex networks, which is less conservative and may be more widely used than the traditional adaptive linear controller. In Ref. [22], the synchronization control of memristor-based recurrent neural networks with impulsive perturbations or boundary perturbations was studied. Two kinds of controllers were designed so that the memristive neural networks with perturbations could converge to the equilibrium points, which evoke human’s memory patterns. In Ref. [23], pinning adaptive synchronization for uncertain complex dynamic networks with multi-link against network deterioration was proposed, new synchronization criteria for networks with multi-link were derived to ensure the synchronized states to be local or global stable with uncertainty and deterioration. It is worth noting that the ADP algorithms have been successfully applied to control chaotic systems. In Ref. [24], an optimal tracking control scheme was proposed for a class of discrete-time chaotic systems using the approximation-error-based ADP algorithm. In that work, the system was defined to be known and the disturbance was not considered. But it lays a foundation of ADP and chaotic system control.
It is known that optimal control has been extensively used in the effort to design controllers for nonlinear systems with disturbances.[25] In Ref. [26], significant insight into the design of control problems has been provided, after it was formulated as a min-max two-player zero-sum (ZS) game problem. The optimal control in such a scenario is equivalent to finding the Nash equilibrium of the corresponding two-player zero-sum game,[27] which results in solving the so-called Hamilton–Jacobi–Isaacs (HJI) equation. During the last few years, strong connections between ADP and the optimal control have prompted a major effort towards developing reinforcement learning algorithms to learn the solution to the HJI equation arising in the optimal regulation problem. In Refs. [28] and [29], the ZS differential games were discussed in the framework of ADP. In Ref. [30], multiperson ZS differential games for a class of uncertain nonlinear systems were studied. In Ref. [31], multiperson non-zero-sum differential games were presented by the off-policy IRL method. The previous research supplies us with a kind of new perspective for optimal tracking control of chaotic system with disturbances.
Based on the previous research works, this paper studies the chaotic system optimal tracking problem with unknown dynamics and disturbances. First an augmented system from the tracking error dynamics and the reference dynamics is constructed and a new cost function is introduced for the optimal tracking problem. The tracking control problem is then transformed to a min-max optimization problem. The PI method is introduced to obtain the iterative cost function using the system dynamics. The off-policy ADP algorithm is then developed to find the solution of the tracking HJI equation online using the measured data and without any knowledge about the system dynamics. The critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The neural network (NN) implementation is given with convergence analyses. At last, two examples are given, and the effectiveness of the proposed synchronous solution method for optimal tracking control problem of chaotic systems is shown by the simulation results.
The rest of this paper is organized as follows. In Section 2, we present the motivations and preliminaries of the discussed problem. In Section 3, the synchronous solution is developed. In Section 4, the NN implementation is given with convergence analyses. In Section 5, two examples are given to demonstrate the effectiveness of the proposed scheme. In Section 6, the conclusion is drawn.
2. Problem formulation
Let us consider the chaotic system with disturbance described by
(1)
where is the chaotic system state, is the control, is the disturbance, and are unknown system dynamics with , and is the unknown disturbance gain. Actually, many nonlinear chaotic dynamical systems can be expressed as Eq. (1), such as Lü system,[32] Chen system,[33,34] Lorenz system,[35,36] several variants of Chua’s circuits,[37,38] and Duffing oscillator.[39,40]
Let be the constant reference trajectory and we have
(2)
The focus of this paper is to find an optimal control u(x). It makes the chaotic system (1) track the given trajectory . And the control input makes a cost function optimal. Therefore, the tracking error system is first defined as
(3)
where the tracking error , , , and and are the control and disturbance of system (2). In this paper, we assume that and is lipschitz. In the tracking error system (3), is the disturbance, which can be seen as another input, and makes the cost function maximum. Then ZS differential game will be adopted to the optimal tracking control problem. The cost function is defined as
(4)
where , R, and S are positive definite matrices.
Putting Eqs. (2) and (3) together yields the augmented system
(5)
where , , , and . By using the augmented system (5), the cost function (4) becomes
(6)
where .
Then the two-player ZS differential game is
(7)
where is the value of the game. In this paper, we assume that the two-player optimal control problem has a unique solution, i.e., the Nash condition holds
(8)
By Leibniz’s formula and differentiating, the nonlinear ZS game Bellman equation, which is given in terms of the Hamiltonian function, is obtained as
(9)
where . The stationary conditions are
(10)
(11)
According to Eq. (9), we have the optimal control and the disturbance
(12)
(13)
Substitute Eqs. (12) and (13) into Bellman equation (9), we can derive V from the solution of the HJI equation
(14)
The HJI equation provides the solution to the optimal control problem for the ZS game. When it can be solved, it provides an optimal control in state-variable feedback (i.e., closed-loop) form. The Bellman equation is a partial differential equation for the value. That is, given any stabilizing feedback control policies and yielding finite values, the solution to the Bellman equation (9) is the value given by Eq. (6).
As the system dynamics is unknown, a data-based synchronous method will be established to solve the HJI equation (14).
3. Data-based synchronous method
In this section, the PI algorithm is first given. Then the off-policy learning is used to transform the PI algorithm to a synchronous method without system dynamics.
The PI algorithm starts from the initial admissible control pair . Then for iterative step , is obtained by
(15)
and the policy pair updates by
(16)
(17)
From Eqs. (15)–(17), we can see that the system dynamics is necessary for the PI algorithm. Therefore, the following synchronous method is given based on the PI algorithm and off-policy learning.
Let and be obtained by Eqs. (16) and (17), then the original system (5) is rewritten as
(18)
According to Eq. (18), we have
(19)
From Eqs. (16) and (17), we have
(20)
(21)
Then equation (19) is
(22)
Thus from Eq. (15), the off-policy Bellman equation for the ZS games is expressed as
(23)
In Eq. (23), the system dynamics is avoided. In the following section, the NN implementation procedure is presented with convergence analysis.
4. NN implementation
The NN implementation procedure is first given in this section. Then the convergence of the NN weight is analyzed.
The neural network expression of CNN is given as
(24)
where is the ideal weight of the critic network, is the active function, and is the residual error. Let the estimation of be . Then the estimation of is
(25)
(26)
The neural network expression of ANN is
(27)
where is the ideal weight of the action network, is the active function, and is the residual error. Let be the estimation of , then the estimation of is
(28)
The neural network expression of DNN is
(29)
where is the ideal weight of the action network, is the active function, and is the residual error. Let be the estimation of , then the estimation of is
(30)
Substituting Eqs. (25), (28), and (30) into Eq. (23), we can define the equation error as
(31)
Substituting Eqs. (25), (28), and (30) into Eq. (31), we have
(32)
where . By kronecker product, it has ,
(33)
(34)
Then we can define
(35)
(36)
(37)
(38)
Therefore, equation (32) becomes
(39)
where .
Define
(40)
According to the gradient descent method, we have the update law of as
(41)
where is a positive number.
Based on the NN implementation and off-policy learning, the convergence of the synchronous method is proposed.
5. Simulation study
In this paper, two simulation examples are given to demonstrate the effectiveness of the proposed method.
5.1. Example 1
Consider the chaotic systems described by the following differential equation:[41]
(57)
where , u is the control input, and d is the disturbance input. Let
(58)
(59)
Here, let β = 0, system (57) becomes the Lorenz system when perturbations are not present. The trajectory of system (57) is shown in Fig. 1.
In this paper, the desired trajectory is . We select hyperbolic tangent functions as the activation functions of critic, action, and disturbance networks. The structures of critic, action, and disturbance networks are 3−8−1, 3−8−3, and 3−8−3, respectively. The initial weight W is selected arbitrarily from (−1, 1). For the cost function, Q, R, and S in the utility function are identity matrices of appropriate dimensions. After 150 time steps, the simulation results are obtained. Figures 2 and 3 are the control and disturbance input trajectories. Based on the inputs, the tracking error is given in Fig. 4, which shows that the tracking error is convergent. The chaotic system state is demonstrated in Fig. 5. Is is clear that the chaotic system tracks the given trajectories.
In this example, the desired trajectory is . The matrices Q, R, and S in the utility function are identity matrices of appropriate dimensions. For critic, action, and disturbance networks, the activation functions are hyperbolic tangent functions. The structures are 3−8−1, 3−8−3, and 3−8−3, respectively. The initial weight W is selected arbitrarily from (−1, 1). After 100 time steps, the simulation results are shown in Figs. 7–10. In Figs. 7 and 8, the control and disturbance inputs are given, which are convergent. Under the inputs action, the tracking error trajectories are displayed in Fig. 9, which converge to zero. At last, the closed-loop chaotic system state based on the inputs is demonstrated in Fig. 10. It can be seen that the proposed method is effective to make the chaotic system track the given trajectories.
An optimal tracking control method for a chaotic system with unknown dynamics and disturbances is proposed in this paper. The tracking error dynamics and the generator dynamics make up the augmented system. A new cost function is introduced for the optimal tracking control problem. The PI is introduced to solve the min-max optimization problem. The off-policy learning method is applied to update the iterative cost function using only measured data and without any knowledge about the system dynamics. ANN, DNN, and CNN are used to approximate the cost function, control, and disturbance with convergence analyses. It is proven that the closed-loop tracking error system is convergent. Simulation results are given to show the effectiveness of the proposed synchronous solution method for chaotic system tracking problem. The future research is to use the proposed approach to solve the control problem for a class of systems with interconnection term, and to analyze the convergence of PI in ZS games.